Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include stats in IPC messages #302

Merged
merged 20 commits into from
May 8, 2024
Merged

Include stats in IPC messages #302

merged 20 commits into from
May 8, 2024

Conversation

jdcasale
Copy link
Contributor

@jdcasale jdcasale commented May 7, 2024

  • Modifies the flatbuffer array type to include an ArrayStats table
    • Array Table contains optional fields corresponding to each currently available stats type
    • Current implementation populates all stats
  • Implementation of the Statistics trait for ArrayView
  • Implementation does no allocations or computation, only references values that exist in the underlying flatbuffer
  • For demonstrative purposes, I've written (and removed) an implementation that allocates only if someone calls set(stat, value) to populate additional, possibly missing, stats. I've removed this because we don't currently have a use for it, but it's easy enough to do without any unsafe shenanigans.
  • Callers can specify which stats should be included with a serialized IPC array when constructing a ViewContext. By default, all stats are included.
  • Tests demonstrating the presence of correct stats after a round-trip through IPC for primitive and chunked arrays

I've included a mechanism to configure all of the statistics by default here because the overhead they add to the flatbuffer message is relatively small, given that the arrays themselves are sufficiently large. I considered adding a mechanism to check the length of the arrays here to choose a subset of stats based on the size of the array (probably just drop the two frequency arrays, because they're much larger than everything else), but decided against it for now. I don't think we expect to frequently see arrays small enough that these stats would add a relatively significant amount of wire overhead

vortex-array/src/view.rs Outdated Show resolved Hide resolved
vortex-array/flatbuffers/array.fbs Outdated Show resolved Hide resolved
vortex-array/src/array/chunked/mod.rs Outdated Show resolved Hide resolved
deps/fastlanez Outdated Show resolved Hide resolved
vortex-array/src/stats/mod.rs Outdated Show resolved Hide resolved
vortex-array/src/view.rs Outdated Show resolved Hide resolved
vortex-array/src/view.rs Outdated Show resolved Hide resolved
vortex-array/src/view.rs Outdated Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
children,
},
)
}
}

/// Computes all stats and uses the results to create an ArrayStats table for the flatbuffer message
fn compute_and_build_stats<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't let the writer configure which stats are written. Can you think of a clean way to support that?

Copy link
Contributor Author

@jdcasale jdcasale May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ViewContext is the right place to configure this, added some config there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's right. ViewContext represents the context required to interpret an array view at read-time. The stats configuration is not part of that.

I would actually say this isn't the job of this file. This file can just serialise the array and any stats that it has already computed. The caller (in this case, the IPCWriter) can set/clear whatever stats it likes before serialisation.

Self { encodings, stats }
}

pub fn set_stats(&mut self, to_enable: &[Stat]) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this set_stats method so that we can default to including stats but allow a caller to smash over this with any combination of stats they want.

The default mechanism is not implemented because there is an open question as to what the right set of default encodings is. I think using default stats in the from() method below is reasonable behavior, given that it can be modified afterwards.

@jdcasale jdcasale marked this pull request as ready for review May 7, 2024 14:35
_ => None,
};

let mut frequencies: HashMap<_, _> = to_compute
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map/collect pattern was introduced to avoid a bunch of if-contains/else, not sure if I like it better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird...

You can just create a mut fb::ArrayStatsArgs and then for each stat, get value as scalar, then match the stat and cast if required.

@jdcasale jdcasale marked this pull request as draft May 7, 2024 15:28
@jdcasale
Copy link
Contributor Author

jdcasale commented May 7, 2024

Actually I might have needed the eager merging of chunked stats.

@jdcasale
Copy link
Contributor Author

jdcasale commented May 7, 2024

Ok so the original reason I eagerly merged stats for ChunkedArray was that it didn't work otherwise -- I just assumed that it wasn't implemented. There was an implementation, it was just subtly wrong, which meant that calculating Min would never work.

The issue was with fold vs reduce -- fold takes an empty stats array as a base case, and if you merge an empty statsmap with something that has a Min entry, the result has no Min Entry:

e.g. the following panics

        let mut set2 = StatsSet::new();
        set2.set(Stat::Min, 0.into());
        set1.merge(&set2);
        set1.get(Stat::Min).unwrap();

Because the empty stats map isn't there for any reason other than to give us a base case if there are no stats, replacing the fold with a reduce/unwrap_or_else fixes the problem

@jdcasale jdcasale closed this May 7, 2024
@jdcasale jdcasale reopened this May 7, 2024
@jdcasale jdcasale marked this pull request as ready for review May 7, 2024 16:15
@robert3005
Copy link
Member

I knew I should have written tests for stats merging

vortex-array/src/view.rs Outdated Show resolved Hide resolved
vortex-array/src/view.rs Outdated Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
children,
},
)
}
}

/// Computes all stats and uses the results to create an ArrayStats table for the flatbuffer message
fn compute_and_build_stats<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's right. ViewContext represents the context required to interpret an array view at read-time. The stats configuration is not part of that.

I would actually say this isn't the job of this file. This file can just serialise the array and any stats that it has already computed. The caller (in this case, the IPCWriter) can set/clear whatever stats it likes before serialisation.

_ => None,
};

let mut frequencies: HashMap<_, _> = to_compute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird...

You can just create a mut fb::ArrayStatsArgs and then for each stat, get value as scalar, then match the stat and cast if required.

vortex-array/src/array/chunked/stats.rs Show resolved Hide resolved
vortex-array/src/view.rs Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
vortex-ipc/src/messages.rs Outdated Show resolved Hide resolved
@jdcasale jdcasale merged commit 59d4db0 into develop May 8, 2024
2 checks passed
@jdcasale jdcasale deleted the jc/ipc-stats branch May 8, 2024 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants